Overview

Dataset statistics

Number of variables18
Number of observations14388
Missing cells6952
Missing cells (%)2.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.0 MiB
Average record size in memory144.0 B

Variable types

NUM12
CAT5
BOOL1

Reproduction

Analysis started2020-06-06 04:35:04.384899
Analysis finished2020-06-06 04:35:30.621435
Duration26.24 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

target has 6952 (48.3%) missing values Missing
params0 is highly skewed (γ1 = -22.49707985) Skewed
params1 is highly skewed (γ1 = 63.68227615) Skewed
params3 is highly skewed (γ1 = 68.59503099) Skewed
params4 is highly skewed (γ1 = 43.40645376) Skewed
params0 has unique values Unique
rms has unique values Unique
spectrum_filename has unique values Unique
spectrum_id has unique values Unique
layout_x has 398 (2.8%) zeros Zeros

Variables

beta
Real number (ℝ≥0)

Distinct count12920
Unique (%)89.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.3398781139764335
Minimum3.3369076773278623e-37
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum3.336907677e-37
5-th percentile2.397680408e-15
Q10.04737269486
median0.2307974401
Q30.546055588
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.4986828931

Descriptive statistics

Standard deviation0.3389495033
Coefficient of variation (CV)0.9972678125
Kurtosis-0.6959015507
Mean0.339878114
Median Absolute Deviation (MAD)0.2160081881
Skewness0.804329633
Sum4890.166304
Variance0.1148867658
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
113879.6%
 
1280.2%
 
1180.1%
 
1150.1%
 
1120.1%
 
15< 0.1%
 
14< 0.1%
 
13< 0.1%
 
13< 0.1%
 
12< 0.1%
 
Other values (12910)1291189.7%
 
ValueCountFrequency (%) 
3.336907677e-371< 0.1%
 
5.533556644e-331< 0.1%
 
1.262773621e-301< 0.1%
 
3.481021966e-301< 0.1%
 
6.42219085e-301< 0.1%
 
ValueCountFrequency (%) 
113879.6%
 
1280.2%
 
1180.1%
 
15< 0.1%
 
1150.1%
 

chip_id
Categorical

Distinct count9
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size112.4 KiB
3c2948d0a755e5ff99f6
3897
a005efe42b620a737e7e
2853
79ad4647da6de6425abf
1821
118c70535bd753a86615
1805
84b788fdc5e779f8a0df
1194
Other values (4)
2818
ValueCountFrequency (%) 
3c2948d0a755e5ff99f6389727.1%
 
a005efe42b620a737e7e285319.8%
 
79ad4647da6de6425abf182112.7%
 
118c70535bd753a86615180512.5%
 
84b788fdc5e779f8a0df11948.3%
 
6718e7f83c824b1e436d11488.0%
 
0b9dbf13f938efd5717f10087.0%
 
c695a1e61e002b34e5564603.2%
 
a948b8cdcd7957eb5c312021.4%
 

Length

Max length20
Median length20
Mean length20
Min length20

exc_wl
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size112.4 KiB
850
8466
780
5922
ValueCountFrequency (%) 
850846658.8%
 
780592241.2%
 

Length

Max length3
Median length3
Mean length3
Min length3

layout_a
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size112.4 KiB
2
3749
1
3712
3
3585
0
3342
ValueCountFrequency (%) 
2374926.1%
 
1371225.8%
 
3358524.9%
 
0334223.2%
 

Length

Max length1
Median length1
Mean length1
Min length1

layout_x
Real number (ℝ≥0)

ZEROS

Distinct count48
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.49548234639978
Minimum0
Maximum47
Zeros398
Zeros (%)2.8%
Memory size112.4 KiB

Quantile statistics

Minimum0
5-th percentile1
Q110
median24
Q337
95-th percentile45
Maximum47
Range47
Interquartile range (IQR)27

Descriptive statistics

Standard deviation14.55929055
Coefficient of variation (CV)0.6196634031
Kurtosis-1.329048378
Mean23.49548235
Median Absolute Deviation (MAD)13
Skewness-0.0514120973
Sum338053
Variance211.9729413
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
03982.8%
 
13902.7%
 
43872.7%
 
33742.6%
 
383622.5%
 
333582.5%
 
463522.4%
 
403482.4%
 
443482.4%
 
23482.4%
 
Other values (38)1072374.5%
 
ValueCountFrequency (%) 
03982.8%
 
13902.7%
 
23482.4%
 
33742.6%
 
43872.7%
 
ValueCountFrequency (%) 
472631.8%
 
463522.4%
 
453362.3%
 
443482.4%
 
433052.1%
 

layout_y
Real number (ℝ≥0)

Distinct count192
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean97.44683069224354
Minimum0
Maximum191
Zeros56
Zeros (%)0.4%
Memory size112.4 KiB

Quantile statistics

Minimum0
5-th percentile8
Q146
median99
Q3148
95-th percentile183
Maximum191
Range191
Interquartile range (IQR)102

Descriptive statistics

Standard deviation57.22705902
Coefficient of variation (CV)0.5872644458
Kurtosis-1.271339976
Mean97.44683069
Median Absolute Deviation (MAD)51
Skewness-0.05461754493
Sum1402065
Variance3274.936285
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1431110.8%
 
71110.8%
 
1811020.7%
 
1791010.7%
 
1861010.7%
 
187990.7%
 
50980.7%
 
6980.7%
 
171980.7%
 
147970.7%
 
Other values (182)1337292.9%
 
ValueCountFrequency (%) 
0560.4%
 
1820.6%
 
2800.6%
 
3910.6%
 
4610.4%
 
ValueCountFrequency (%) 
191820.6%
 
190860.6%
 
189860.6%
 
188730.5%
 
187990.7%
 

params0
Real number (ℝ)

SKEWED
UNIQUE

Distinct count14388
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-49.48396538142775
Minimum-162480.31288668307
Maximum2850.6578511143766
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum-162480.3129
5-th percentile-27.09550743
Q146.31447739
median157.5178626
Q3304.0492356
95-th percentile493.3347143
Maximum2850.657851
Range165330.9707
Interquartile range (IQR)257.7347582

Descriptive statistics

Standard deviation3726.311048
Coefficient of variation (CV)-75.30340423
Kurtosis651.4684634
Mean-49.48396538
Median Absolute Deviation (MAD)123.2630787
Skewness-22.49707985
Sum-711975.2939
Variance13885394.03
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-65.98878581< 0.1%
 
29.201332431< 0.1%
 
332.40816021< 0.1%
 
148.15786091< 0.1%
 
94.271613921< 0.1%
 
360.37064831< 0.1%
 
73.801659661< 0.1%
 
203.66511151< 0.1%
 
24.462014141< 0.1%
 
373.1878251< 0.1%
 
Other values (14378)1437899.9%
 
ValueCountFrequency (%) 
-162480.31291< 0.1%
 
-124965.83361< 0.1%
 
-119741.31421< 0.1%
 
-117100.27251< 0.1%
 
-94136.754261< 0.1%
 
ValueCountFrequency (%) 
2850.6578511< 0.1%
 
2205.0086771< 0.1%
 
1993.973771< 0.1%
 
1956.2674071< 0.1%
 
1868.5673961< 0.1%
 

params1
Real number (ℝ≥0)

SKEWED

Distinct count14326
Unique (%)99.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean174344.35260081355
Minimum1.2464153485861729e-29
Maximum507377792.1048709
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum1.246415349e-29
5-th percentile1e-10
Q11080.285552
median7001.471568
Q324400.06786
95-th percentile85043.32888
Maximum507377792.1
Range507377792.1
Interquartile range (IQR)23319.7823

Descriptive statistics

Standard deviation6016008.027
Coefficient of variation (CV)34.50646916
Kurtosis4568.134139
Mean174344.3526
Median Absolute Deviation (MAD)6751.647965
Skewness63.68227615
Sum2508466545
Variance3.619235258e+13
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1e-10630.4%
 
953.58102531< 0.1%
 
888.42445021< 0.1%
 
52.860284311< 0.1%
 
6218.3229161< 0.1%
 
42281.714151< 0.1%
 
327.68925151< 0.1%
 
82265.815291< 0.1%
 
9633.7325531< 0.1%
 
19067.914911< 0.1%
 
Other values (14316)1431699.5%
 
ValueCountFrequency (%) 
1.246415349e-291< 0.1%
 
3.967801941e-241< 0.1%
 
6.255908306e-231< 0.1%
 
3.44452566e-211< 0.1%
 
5.262123287e-211< 0.1%
 
ValueCountFrequency (%) 
507377792.11< 0.1%
 
329748667.91< 0.1%
 
252597046.81< 0.1%
 
251951190.41< 0.1%
 
59014441.21< 0.1%
 

params2
Real number (ℝ≥0)

Distinct count14203
Unique (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1245.5799411067878
Minimum1000.0
Maximum1599.9999999999998
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum1000
5-th percentile1076.305323
Q11090.07876
median1231.312356
Q31362.177354
95-th percentile1515.033737
Maximum1600
Range600
Interquartile range (IQR)272.0985933

Descriptive statistics

Standard deviation150.7385884
Coefficient of variation (CV)0.1210187989
Kurtosis-0.8645515764
Mean1245.579941
Median Absolute Deviation (MAD)138.4313872
Skewness0.413404394
Sum17921404.19
Variance22722.12203
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1597710.5%
 
1003600.4%
 
100080.1%
 
16007< 0.1%
 
1289.0664< 0.1%
 
10004< 0.1%
 
1142.9063< 0.1%
 
1165.923< 0.1%
 
1286.9833< 0.1%
 
1219.2043< 0.1%
 
Other values (14193)1422298.8%
 
ValueCountFrequency (%) 
10004< 0.1%
 
10001< 0.1%
 
10001< 0.1%
 
100080.1%
 
10001< 0.1%
 
ValueCountFrequency (%) 
16003< 0.1%
 
16001< 0.1%
 
16001< 0.1%
 
16001< 0.1%
 
16001< 0.1%
 

params3
Real number (ℝ≥0)

SKEWED

Distinct count13269
Unique (%)92.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.595020055030526
Minimum0.5000000000000001
Maximum4940.099845242949
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum0.5
5-th percentile0.5
Q12
median4.588196205
Q35.908641985
95-th percentile14.79655756
Maximum4940.099845
Range4939.599845
Interquartile range (IQR)3.908641985

Descriptive statistics

Standard deviation63.14547827
Coefficient of variation (CV)9.57472119
Kurtosis4855.784601
Mean6.595020055
Median Absolute Deviation (MAD)2.588196205
Skewness68.59503099
Sum94889.14855
Variance3987.351426
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
27094.9%
 
0.52852.0%
 
0.5120.1%
 
0.5110.1%
 
0.590.1%
 
1090.1%
 
0.580.1%
 
0.57< 0.1%
 
0.56< 0.1%
 
0.56< 0.1%
 
Other values (13259)1332692.6%
 
ValueCountFrequency (%) 
0.52852.0%
 
0.590.1%
 
0.5120.1%
 
0.55< 0.1%
 
0.5110.1%
 
ValueCountFrequency (%) 
4940.0998451< 0.1%
 
4303.511951< 0.1%
 
3655.7708381< 0.1%
 
733.85332491< 0.1%
 
278.28166121< 0.1%
 

params4
Real number (ℝ≥0)

SKEWED

Distinct count14201
Unique (%)98.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean924154.1468560996
Minimum8.1584815152519085e-19
Maximum1283687252.4541855
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum8.158481515e-19
5-th percentile5e-13
Q110601.72245
median21770.23528
Q343253.49235
95-th percentile122748.5237
Maximum1283687252
Range1283687252
Interquartile range (IQR)32651.7699

Descriptive statistics

Standard deviation19199989.3
Coefficient of variation (CV)20.77574328
Kurtosis2321.441145
Mean924154.1469
Median Absolute Deviation (MAD)13993.54673
Skewness43.40645376
Sum1.329672986e+10
Variance3.686395893e+14
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5e-131881.3%
 
36252.343441< 0.1%
 
131977.46041< 0.1%
 
6730.4855181< 0.1%
 
12076.937921< 0.1%
 
11103.829591< 0.1%
 
41107.858651< 0.1%
 
19217.003081< 0.1%
 
21468.150971< 0.1%
 
41447.298331< 0.1%
 
Other values (14191)1419198.6%
 
ValueCountFrequency (%) 
8.158481515e-191< 0.1%
 
3.1323902e-181< 0.1%
 
4.074092224e-181< 0.1%
 
4.742827243e-181< 0.1%
 
6.693133025e-181< 0.1%
 
ValueCountFrequency (%) 
12836872521< 0.1%
 
989515634.61< 0.1%
 
819365760.11< 0.1%
 
717043701.91< 0.1%
 
573587229.31< 0.1%
 

params5
Real number (ℝ≥0)

Distinct count14171
Unique (%)98.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1249.4005912290147
Minimum1000.0
Maximum1599.9999999999998
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum1000
5-th percentile1003.216973
Q11089.009329
median1232.251148
Q31366.971049
95-th percentile1596.997198
Maximum1600
Range600
Interquartile range (IQR)277.9617196

Descriptive statistics

Standard deviation164.0534498
Coefficient of variation (CV)0.1313057245
Kurtosis-0.747870197
Mean1249.400591
Median Absolute Deviation (MAD)142.5137665
Skewness0.4621251171
Sum17976375.71
Variance26913.5344
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1000670.5%
 
1600550.4%
 
1597400.3%
 
1003300.2%
 
1597180.1%
 
10036< 0.1%
 
16005< 0.1%
 
16004< 0.1%
 
1362.6254411< 0.1%
 
1406.7031451< 0.1%
 
Other values (14161)1416198.4%
 
ValueCountFrequency (%) 
10001< 0.1%
 
10001< 0.1%
 
10001< 0.1%
 
10001< 0.1%
 
10001< 0.1%
 
ValueCountFrequency (%) 
16004< 0.1%
 
16001< 0.1%
 
16001< 0.1%
 
16001< 0.1%
 
1600550.4%
 

params6
Real number (ℝ≥0)

Distinct count14382
Unique (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean88242.70284876494
Minimum0.5000000231602091
Maximum4883344.165783549
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum0.5000000232
5-th percentile5.48301671
Q17.953383392
median11.3821437
Q316.96238096
95-th percentile81697.06043
Maximum4883344.166
Range4883343.666
Interquartile range (IQR)9.008997568

Descriptive statistics

Standard deviation523558.8148
Coefficient of variation (CV)5.933168386
Kurtosis40.28262086
Mean88242.70285
Median Absolute Deviation (MAD)3.900304466
Skewness6.411890542
Sum1269636009
Variance2.741138326e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
107< 0.1%
 
13.530513251< 0.1%
 
13.690647841< 0.1%
 
2933.3473071< 0.1%
 
11.208381871< 0.1%
 
14.813644591< 0.1%
 
11.437543781< 0.1%
 
26.31474061< 0.1%
 
10.913547641< 0.1%
 
24.795554131< 0.1%
 
Other values (14372)1437299.9%
 
ValueCountFrequency (%) 
0.50000002321< 0.1%
 
0.50000773611< 0.1%
 
0.50054360721< 0.1%
 
0.62507194371< 0.1%
 
0.69473313941< 0.1%
 
ValueCountFrequency (%) 
4883344.1661< 0.1%
 
4689999.771< 0.1%
 
4002886.9881< 0.1%
 
4001311.6031< 0.1%
 
4001233.0491< 0.1%
 

pos_x
Real number (ℝ)

Distinct count14295
Unique (%)99.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.877409570475395
Minimum-1704.704
Maximum1698.19
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum-1704.704
5-th percentile-1606.56925
Q1-972.04915
median187.43875
Q31052.58025
95-th percentile1569.99925
Maximum1698.19
Range3402.894
Interquartile range (IQR)2024.6294

Descriptive statistics

Standard deviation1092.881469
Coefficient of variation (CV)185.9461138
Kurtosis-1.420822706
Mean5.87740957
Median Absolute Deviation (MAD)995.47275
Skewness-0.03974500368
Sum84564.1689
Variance1194389.905
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1344.0892< 0.1%
 
1055.5942< 0.1%
 
1533.172< 0.1%
 
-1696.6222< 0.1%
 
1314.5542< 0.1%
 
1281.5762< 0.1%
 
-1149.8542< 0.1%
 
-1508.1842< 0.1%
 
1569.8622< 0.1%
 
-1667.8312< 0.1%
 
Other values (14285)1436899.9%
 
ValueCountFrequency (%) 
-1704.7041< 0.1%
 
-1704.2221< 0.1%
 
-1704.0371< 0.1%
 
-1704.0051< 0.1%
 
-1702.7771< 0.1%
 
ValueCountFrequency (%) 
1698.191< 0.1%
 
1693.6471< 0.1%
 
1688.861< 0.1%
 
1669.7041< 0.1%
 
1669.5881< 0.1%
 

rms
Real number (ℝ≥0)

UNIQUE

Distinct count14388
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.788097914002476
Minimum6.110799020422154
Maximum212.5098034623708
Zeros0
Zeros (%)0.0%
Memory size112.4 KiB

Quantile statistics

Minimum6.11079902
5-th percentile7.241705084
Q18.289552626
median9.822982525
Q313.8692117
95-th percentile28.13839411
Maximum212.5098035
Range206.3990044
Interquartile range (IQR)5.579659073

Descriptive statistics

Standard deviation8.394044549
Coefficient of variation (CV)0.6563950797
Kurtosis48.21474878
Mean12.78809791
Median Absolute Deviation (MAD)1.974224635
Skewness4.768462979
Sum183995.1528
Variance70.4599839
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
10.198017411< 0.1%
 
10.28930171< 0.1%
 
7.2616567521< 0.1%
 
8.3737383141< 0.1%
 
8.86797841< 0.1%
 
19.618487081< 0.1%
 
14.465048011< 0.1%
 
19.155972461< 0.1%
 
6.8537131221< 0.1%
 
21.842008191< 0.1%
 
Other values (14378)1437899.9%
 
ValueCountFrequency (%) 
6.110799021< 0.1%
 
6.1384890531< 0.1%
 
6.2299447151< 0.1%
 
6.2751241181< 0.1%
 
6.2973440121< 0.1%
 
ValueCountFrequency (%) 
212.50980351< 0.1%
 
133.85520831< 0.1%
 
129.38126031< 0.1%
 
114.83380411< 0.1%
 
109.19491511< 0.1%
 

spectrum_filename
Categorical

UNIQUE

Distinct count14388
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size112.4 KiB
d7de9b0df9f02952cae6.dat
 
1
24996eb54e2d3a0f707b.dat
 
1
6daa3504fd7e483e77de.dat
 
1
efea6724b526b77cb091.dat
 
1
ce7a67e494107e858e80.dat
 
1
Other values (14383)
14383
ValueCountFrequency (%) 
d7de9b0df9f02952cae6.dat1< 0.1%
 
24996eb54e2d3a0f707b.dat1< 0.1%
 
6daa3504fd7e483e77de.dat1< 0.1%
 
efea6724b526b77cb091.dat1< 0.1%
 
ce7a67e494107e858e80.dat1< 0.1%
 
583494b7371de0107cf2.dat1< 0.1%
 
47db4b0331ef53573b5c.dat1< 0.1%
 
6537e47ed0bd08217093.dat1< 0.1%
 
4d4765fe4d85ab2ebf16.dat1< 0.1%
 
1aaa3c0a55ed1c5cdd64.dat1< 0.1%
 
Other values (14378)1437899.9%
 

Length

Max length24
Median length24
Mean length24
Min length24

spectrum_id
Categorical

UNIQUE

Distinct count14388
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size112.4 KiB
ef4b6552336a0297b903
 
1
fe21aab0e21d93aab0f8
 
1
ec4dd04afe54a8ff63ef
 
1
7519421ac11ff31ff432
 
1
d339e935dcb2cf5ffdb5
 
1
Other values (14383)
14383
ValueCountFrequency (%) 
ef4b6552336a0297b9031< 0.1%
 
fe21aab0e21d93aab0f81< 0.1%
 
ec4dd04afe54a8ff63ef1< 0.1%
 
7519421ac11ff31ff4321< 0.1%
 
d339e935dcb2cf5ffdb51< 0.1%
 
7c6d2157c6cb1754a9871< 0.1%
 
1ce4c5eb73de25ba5f411< 0.1%
 
8af4ada13cb8d48343721< 0.1%
 
39177776a09c0eade3d51< 0.1%
 
bd62baac551534f7966f1< 0.1%
 
Other values (14378)1437899.9%
 

Length

Max length20
Median length20
Mean length20
Min length20

target
Boolean

MISSING

Distinct count2
Unique (%)< 0.1%
Missing6952
Missing (%)48.3%
Memory size112.4 KiB
0
7200
1
 
236
(Missing)
6952
ValueCountFrequency (%) 
0720050.0%
 
12361.6%
 
(Missing)695248.3%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

betachip_idexc_wllayout_alayout_xlayout_yparams0params1params2params3params4params5params6pos_xrmsspectrum_filenamespectrum_idtarget
02.521298e-0279ad4647da6de6425abf85023614030.8085895.811802e+021037.7147521.53142322469.6516411032.3172688.2956101313.081010.028668b2e223339f4abce9b400.dat000da4633378740f1ee80.0
13.435612e-0179ad4647da6de6425abf7803016891.3008971.740582e+041080.5104524.76623333257.1231751077.4688558.018225159.41507.948485e2f150a503244145e7ce.dat000ed1a5a9fe0ad2b7dd0.0
22.348528e-15c695a1e61e002b34e55678013429106.6429461.000000e-101119.4644382.00000042579.8679131378.88333811.687417-610.768810.7398593d58b7ccaee157979cf0.dat0016e3322c4ce0700f9a0.0
32.183921e-01c695a1e61e002b34e556780232139306.9336741.099486e+041139.8550675.19869239349.7417031145.2128499.4450291214.618010.379948ed3641184d3b7c0ae703.dat00256bd0f8c6cf5f59c80.0
44.176962e-01c695a1e61e002b34e5567800458546.1332562.227622e+041120.9183375.66801231054.9286731117.1077827.658710-257.61618.3165504c63418d39f86dfab9bb.dat003483ee5ae313d375900.0
52.913505e-156718e7f83c824b1e436d7803265016.5974691.000000e-101090.0149992.00000034322.9237961217.30593411.552158988.16807.752002cf8657e1503943995dbe.dat0037f18f5aaec409bef70.0
62.241769e-020b9dbf13f938efd5717f780047118-61.5814425.348651e+031288.5789351.211405233242.0278031285.6088649.857559-190.857216.3928557a226cdd3178326a87f1.dat0041c80d2bb0ad8f0c1e0.0
71.324279e-0284b788fdc5e779f8a0df85036118165.8748227.272072e+021285.7742951.01836554186.2429301281.5932549.072799355.043516.623362bb142787b681af27ac86.dat0046ee889a24134bd3510.0
81.543495e-01118c70535bd753a866157802236057.5590505.012301e+031512.4798064.75993027461.3998041506.01722614.163725899.427711.089719662d2f3542adf199f3a2.dat0050161961b3e2e398de0.0
98.097873e-1684b788fdc5e779f8a0df85022268-15.4575321.000000e-101126.9756782.000000123489.2183781430.43555014.764868865.54778.6468492c576da41d645d597091.dat005c0d491a359ccff5830.0

Last rows

betachip_idexc_wllayout_alayout_xlayout_yparams0params1params2params3params4params5params6pos_xrmsspectrum_filenamespectrum_idtarget
143780.486590a005efe42b620a737e7e85023347210.0409945958.1781471089.6509162.9687196.286572e+031086.6997205.1746571244.48507.916921cf32faa5f7fd96ab9b4c.datffb380926d97f0c17e6eNaN
143790.2410963c2948d0a755e5ff99f6780234184145.5152876244.9918591502.8360822.9633621.965748e+041505.07865311.5044591247.067011.7703369418b8210c2e55cf2af9.datffb8e22c7abd6d35da70NaN
143800.417445a005efe42b620a737e7e850139183460.70671719726.3329601088.9368264.0500892.752859e+041087.7600999.556193-448.01549.976327ab29262a63382cb75f4e.datffd320b432ea04daf25bNaN
143811.000000a005efe42b620a737e7e7801351421038.003573232553.1257531284.72643011.8095874.181811e-151599.994457232556.735929-576.867697.9540713528d345a9fef49494a9.datffd5a8c80f8d30ad37f8NaN
143820.0897273c2948d0a755e5ff99f68500347123.4186761064.0904601086.3856360.5000001.079508e+041086.4946856.690598-609.69069.7439958e75b09f759be1433b58.datffdd738462bbf44c8731NaN
143830.1162263c2948d0a755e5ff99f685032114150.3510501575.7437371087.5302210.5000001.198183e+041085.8642815.739515220.99719.893204a9309e1b871e8089dedb.datffe3f18bccea9eca0c4bNaN
143840.237499a005efe42b620a737e7e780316181285.0447495157.4877661153.6983214.5339691.655832e+041149.5581226.347408702.840419.91275301d6b771f9b18d2c8be5.datffe5dc9b0008f1686fbbNaN
143850.0261823c2948d0a755e5ff99f678012539201.5945801298.7873591137.0379324.3975214.830745e+041162.60008910.304924-897.36088.3470386dc212d4616d7e28ac68.datffe99ef3b8a4ffb5cbfdNaN
143860.094193a005efe42b620a737e7e7800395986.4156821536.9736851155.2524142.4376321.478039e+041154.47928117.361207-1599.42809.97249773db945d1ec8d0d97b51.datfff6557194ea0487af92NaN
143870.0081323c2948d0a755e5ff99f68503281048.644496446.1576451552.5154690.5000005.442098e+041559.17708914.4777101057.12809.881024d147cb4379e428a08ebf.datffffb084eeba6fd04e59NaN